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Abstract 

We develop a multi-scale theory for group of diffeomorp hisms based on 
previous works [BGBHRlll . IrV W+lll . ISNLPlll. ISLNPlll j. The purpose 
of the paper is to develop in details a variational approach for multi- 
scale on diffeomorphisms and to generalize to several scales the semi- 
direct produ ct of group represen tation. We also show that the approaches 
presented in |SNLPlll . lSLNPll| and the mixture of kernels of [RVW+11] 
are equivalent. 

1 Introduction 

In this paper, we develop a multi-scale theory for groups of diffeomorphisms in 
the context of image registration. The setting of larg e deformation d iffeomor- 



phic matching has been introduced in seminal papers |Tro98 . DGM98| an d this 



approach has been applied in the field of computational anatomy GM98I ] . The 
initial problem deals with the diffcomorphic registration of two given biomedical 
images or shapes. An important aspect of this model is the use of Reproducing 
Kernel Hilbert Spaces (RKHS) of vector valued functions to define the Lie al- 
gebra of the group of diffeomorphisms. Although our context is very different 
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from kernel based methods for machine learning |HTF09{ . s ome analo gies may 



be worth noticing. We now present the model developed in |BMTY05| . 



Definition 1.1. Let fl be a domain in K d . An admissible Reproducing Kernel 
Hilbert Space H of vector fields is a Hilbert space of C 1 (f2) vector fields such 
that there exists a positive constant M, s.t. for every v £ H the following 
inequality holds 

|M|l,oo < M S \\V\\ H (1) 

Remark 1.2. The reproducing kernel associated with the space H is given by 
K(x,y) = (5 x ,K6 y ), where 8 x ,8 y are the pointwise evaluation maps which are 
linear forms by assumption on the space H. The kernel completely defines the 
space H. 

The diffeomorphism group G associated with the RKHS H is given by 
{ipi | v £ L 2 ([0, where ip t is the flow of v, i.e. 

idf ipt = v(t) o (p t ^ 
| (p = Id . 

The diffcomorphic matching problem is the minimization of the functional 

F(v) = / \\v(t)\\ 2 H dt + d(<pi.qo,qtaxget), (3) 

Jo 

where v £ L 2 ([0, 1],H) and go:9tar g et are objects of interest such as groups of 
points, measures, currents or images. The action of the group G on the objects 
space is denoted by ip.q, where ip is an element of the group and q is an object. 
The distance function d that enforces the matching accuracy is usually taken 
to be the square of the norm if the objects live in a normed vector space, e.g. 
for images one would use the L 2 norm, d(qo,qi) = J n \qo(x) — qi(x)\ 2 dx. This 
minimization problem enables us to match images via geodesies on the group 
G, if we endow G with the right-invariant metric obtained by translating the 
inner product (., .)h on the Lie algebra H to the other tangent spaces. More 
importantly by its action on the space of images, the right-invariant metric on 
the group induces a metric on the orbits of the image space and t he final de - 



formation is completely encoded in the so-called initial momentum [VMTY04 



This initial momentum has the same dimension as the i mage. Sin ce it is an 
element of a linear space, statistics can also be done on it [SFP+lOj . 



A RKHS corresponding to a Gaussian kernel is commonly used in practice. 
The choice of its standard deviation a is an important problem and describes 
a trade-off between the smoothness of the deformation and the matching ac- 
curacy. Indeed, a large standard deviation produces very smooth deformations 
with a poor matching accuracy of the structures having a size smaller than a. 
On the contrary, a small standard deviation results in a good matchi ng accuracy 
but the deformations may present undesirable very large Jacobians 



It is therefore a natural step to introduce a mixture of Gaussian kernels with 



different standard deviations. In RVW + 11 |. the authors show that such kernels 



outperform single Gaussian kernels, when registering images containing fea- 
tures of interest at several scales simultaneously: they provide a good matching 
quality, while keeping the diffeomorphisms smooth. Naturally, this introduces 
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more parameters in the algorithm, which need to be tuned. Practical insight 
about how to parametrize the scales and weights of multiple kernels are given in 



RVW + ll|. The idea of using a mixture of kernels for matching is directly con- 



nected to |BGBHR"i1 |. where it is proven that there is an equivalence between 
the matching with a sum of two kernels and the matching via a semi-direct 
product of two groups. 

The work on the metric u nderlying the LDD MM methodology [BMTY05 | 



has also been followed up by [SNLPll . SLNPllj, where the authors introduce 



the notion of a bundle of kernels and argue that this general framework can be 
used to deal with multi-scale LDDMM. By passing, we prove that their approach 
reduces to the mixture of kernels. We give a self-contained and simple proof 
of this resu lt based on Lagrange multipliers and we develop an extension of 
IBGBHRll . first to a finite number of scales and then to a continuum of scales. 



The paper is divided into three parts: the first part focuses on a finite number 
of scales while the second treats the case of a continuum of scales. The last part 
of the paper is devoted to numerical simulations, where we show in particular 
the decomposition on the given scales of the optimized diffcomorphism. 



2 A finite number of scales 



2.1 The finite mixture of kernels 

For the sake of simplicity, we first treat the case of a finite set of admissible 
Hilbert spac es Hj for i = 1, . . . , k. Denoting H = Hi + . . . + Hk, the norm 
proposed in SNLPll| as well as in |BGBHRll| is defined by 



E 



k 

E 1 



(4) 



The following lem ma is the main argument to prove the equivalence between 
the approaches o f ISNLPllI and |RVW+ This lemma is an old result that 



can be found in [Aro50t - However, we present for the sake of completeness a 



simple proof based on the Lagrange multiplier rule. Moreover, if one wants to 
skip the technical details of the proof, the formal application of the Lagrange 
multiplier rule gives the result immediately. We outline the formal proof: If 
one has to minimize the norm defined in Formula Q then one can introduce a 
Lagrange multiplier p and obtain a stationary point of the Lagrangian 



L v (v 1: . . .,v n ,p) = ^2 

i=l 

which gives vi = Kip and 



u i\\Hi 



E^p- 



(5) 



(6) 



Hence, it gives a heuristic argument why the problem of optimizing at several 
scale simultaneously reduces to a mixture of the kernel. 

Lemma 2.1. The formula ([4]) defines a scalar product on H which makes H a 
RKHS and its associated kernel is K := Xa=i where Ki denotes the kernel 
of the space Hi . 
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Proof. Let x <G f2, a 6 R d and <5" be the pointwise evaluation defined by 
S"(v) = {v(x),a) R d. By hypothesis on each Hi, 5™ is a linear form which 
implies that Ev( x , a ) : 6 0j = i #j ^ E^i^cO'*) G M is also a lin- 

ear form on @ i= iHi. Now, for any v that can be written as v = y\_-, Uj 
for (uj) 6 ©»=i H, there exists a unique (vi) £ ©, fc = i ^» minimizing the 



functional N ( (uj ) j=i . , ./j) 



Ej=i IMIj^ and satisfying £\ =1 = This 



unique element is given by the projection theorem for Hilbert space |Bre83| 
V := (T\( x a )eQxM d ^ v (xa) which is a closed non-empty subspace in 

© i=1 Hi- Therefore is isometric to V and hence it is a Hilbert space. 

In order to identify the kernel of H , we apply the Lagrange multiplier rule. 
An optimal (uj) £ ©, = i -Hi corresponds to a stationary point of the augmented 
functional on H* 



^ k / k 

N(p,(vi) i=1 ... k ) = ^Hlfr, + (p,t;-^«i 



(7) 



H*,H 



Remark that the norm || • \\h makes the injection ji : Hi H continuous and 
as a consequence j* : H* — > H* is defined by duality. Therefore the pairing 
(p, Vi) is well defined in Formula ©. Then, at a stationary point (p, (t>i)i=i...fc) 
we have 

J v t = Ki(p) for i = l...k 
I ' ! 



(8) 



Note that in the previous formula, we could have written the heavier notation 
Vi = Ki(j*p) to be more precise. Taking the dual pairing of the last equation 
with p we get 

k k 

(p,«) = ElNlL = E(P.^iP) = IHir > 

i=l i=l 

which implies that the Riesz isomorphism between H* and if is given by the 
map p £ H* ^ J2i=i K i(p) e H - Moreover, we have f|i=i H i c For 
P G flLi wc have > 



as. 



i(p,«)i < 5Zicp.«*)i ^ EMk'Ni* ^ 

i=l i=l 

which is true for any decomposition of v so that 



Eh** 



EMIir; 



l(P.w)l < 



x eni^ihi^' 

\ 1=1 



(9) 



Since 5* £ f\*=i H i c #*> # is a RKHS and its kernel is K = E*=i ^ 1=1 

We now define the isometric injection of if in H»- 
Definition 2.2. We denote by ?r : Jf i-» 0? =1 iJ 4 the map defined by 7r(u) = 
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The non-linear version of this multi-scale approach to diffeomorphic match- 
ing problems is the minimization of 

~i k 

s ( v )= / ^2 ll«t(*)l|jy 4 di + d(ipi.qo,qtarget), (10) 
Ja *=i 

defined on i=1 Hi. Recall that ip t is the flow generated by v(t) := J2i=i 
The direct consequence of Lemma 12.11 is the following proposition. 

Proposition 2.3. The minimization of £ reduces to the minimization of 

F(v) = / \\v(t)\\ff dt + d(<pi.qo,qtarget)- (H) 

Jo 

Proof. Obviously, the minimization of T is the minimization of £ restricted 
to n(H). Remark first that for any (u ?; ) G L 2 ([0, 1], 0*" =1 i/j) then 7r((«i)) £ 
i 2 ([O,l],0ti^)- Denoting v = £* =1 we have 11^(^)11^2 < || C^*) lli a wrtri 
equality if and only if n(v) = (u,). Therefore, if (uj) G £ 2 ([0, l],0j = i flj) is a 
minimizcr of £ 

£(tt(v)) < £ ((«<)) (12) 
which implies 7r(v) = (uj) and the result. □ 

Remark 2.4. To a minimizing path v(t) G L 2 ([0, l],ff) corresponds a min- 
imizing path in © i=1 i?i via the map 7r. In other words, the optimal path 
can be decomposed on the different scales using each kernel present in the 
reproducing kernel of H, K = J2i=i Ki- 



2.2 Iterated semi-direct product 

Until now, the scales have been introduced only on the Lie algebra of the dif- 
feomorphism group and an important question is how to decompose the flow 
of diffcomorphism s according to these scales. An answer in the case of two 
scales is given in |BGBHRllj . where the flow is decomposed with the help of 
a semi-direct product of groups and the whole transformation is encoded in a 
large-scale deformation and a small-scale one. The underlying idea is to repre- 
sent the diffeomorphism flow tp(t) by a pair (ijji(t),ip2(t)) where ipi(t) is given 
by the flow of the vector field v\{t) and ip2(t) := ip(t) o (^(i)) -1 . Remark in 
particular that ^2 (t) is not the flow of V2 (t) . More precisely, we have 

fanMt) = «i(t)oVi(t) fl3) 

\dtMt) = («i(*) + v 2 (t)) o Mt) - Ad^ (t) v 2 (t) o 4> 2 {t) . 



Interestingly, as shown in |BGBHrTi |. this decomposition of the diffeomorphism 



flow corresponds to a semi-direct product of group. This framework can be 
generalized to a finite number of scales as follows. 

Given n scales, we want to represent ip(t) by an n-tuple (ipi(t), . . . , tp n (t)), 
with ipi(t) corresponding to the finest scale and ip n (t) to the coarsest scale. The 
geometric construction underlying the decomposition into multiple scales is the 
semidirect product, introduced in the following lemma. 
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Lemma 2.5. Let Gi 3 G2 3 . . . D G„ be a chain of Lie groups. The n-fold 
semidirect product is given by the multiplication 

(gi, . . .,g n ) ■ (hi, h n ) = (gi c 92 ... g „ hi,g 2 c g3 ... gn h 2 , ■ ■ .,g n h n ) 

with c g h = ghg^ 1 denoting conjugation. Then given a right-invariant velocity 
field v(t) = (vi(t), . . . ,v n (t)) the curve g(t) is reconstructed via 

dt9k(t) = (v k (t) + (Id- Ad 9fc(t) )][>(^ g k (t) , (14) 

if k > 2 and dtg\(t) = Vi(t)gi(t) which is the semidirect product equivalent of 
dtg(t) = v(t)g(t). We shall denote this semidirect product by 

Gi x • ■ • xi G„ 

to emphasize that each subproduct Gk X • • • X G„ is a normal subgroup of the 
whole product. 

Proof. Verifying the axioms for the group multiplication is a straigt forward, if 
slightly longer calculation. The inverse is given by 

(fll, • • • > On)' 1 = (V-SJ-'A 1 ''--'^ 1 ) ' 

The right hand side of equation can be obtained by differentiation the 
group multiplication at the identity, i.e. computing d t (h(t) ■ <?)|t=o with g fixed, 
h(0) = e and dth(t)\ t= o =v. □ 

In our case the group Gk is the diffcomorphism group Diffx fc (i7) generated 
by vector fields in the space Hk u corresponding to the kernel K^. The subgroup 
condition Diffx fc (^) 3 ^' l ^K k+1 (^) is satisfied, if we impose the corresponding 
condition Hx k D HK k+1 on the spaces of vector fields, which we will assume 
from now on. 

Starting from an n-tuple vi(t), . . . , v n (t) of vector fields, we can reconstruct 
the diffcomorphisms at each scale via 

dtMt) = (v k {t) + (Id - Ad^ (t) ) 2 Vi(t)j o ^ k (t) (15) 

as in Lemma [2.51 We can also sum over all scales to form v(t) = 5Zfe=i v k(t) 
and compute the flow tp(t) of v(t). Then a simple calculation shows that 

ip(t) = ipi(t) o . . . o ip n (t) . 

This construction can be summarized by the following commuting diagram 



(vi(t),...,v n (t)) *~v(t) 



fT5l 



dMt)=v(t)o V (t) (16) 



(^i(t), . . . , v„(t)) ... . . f , — — * v(t) 

We can now formulate several equivalent versions of LDDMM matching with 
multiple scales. The most straight-forward way is to do matching with a kernel, 
which is a sum of kernels of different scales. This is the approach considered in 
RVW+11 |. 
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Definition 2.6 (LDDMM with Sum-of-Kernels). Registering Iq to itarg is done 
by finding the minimizer v(t) of 

\f Q \mtK^+^\w)-h-it a r a \\ 2 , 

where K = X)"=o ^* * s a sum °^ n kernels. 

The corresponding simultaneous multiscale registration problem, where one 
assigns to each scale a separat e vect or field is a special case of the kernel bundle 
method proposed in [SNLPllj and (SLNP11 |. 



Definition 2.7 (Simultaneuous Multiscale Registration). Registering Iq to Itarg 
is done by finding the minimizing ?i-tuple («i(i), • ■ • , v n (t)) of 



1 - r 1 1 

2E / \\v t (t)\\ 2 K l dt+—Ml).I -I l 



targ || 



where tp(t) is the flow of the vector field v(t) = Y27=i v i(t)- 

The geometric version of the multiscale registration not only uses separate 
vector fields for each scale, but also decomposes the diffcomorphisms according 
to scales and can be defined as follows. 

Definition 2.8 (LDDMM with a Semidirect Product). Registering Iq to I t arg 
is done by finding the minimizing n-tuple (vi(t), . . . , v n (t)) of 



1 ™ r 1 



where ip(i) = i/Ji(t) o . . . o ip n (t) and ipi(t) is defined via (|T5l) . 



Problem 12.81 can be obtained from the abstract framework in BGBHRf 1 1 
by considering the action 

(Diff Kl (f2) xi • • • x Diff Kn (0))xV 4 V 

((lP u ...,i> n ),I) ^ Ioijj-ic.oijj- 1 ^ '> 

of the semidirect product on the space of images. 

Theorem 2.9. The matching vroblems \2.(A \2.7\ and \2.8\ are all equivalent. 



Proof. The equivalence of problems 12.61 and 12.71 follows from Proposition 12.31 
while the equivalence of problems 12.71 and 12.81 follows from the diagram (TT6"|) . 
For the case n = 2 the proof can be found in more details in jBGBHRllI ]. □ 

This con struction can a lso be ge neralized to a continuum of scales as in- 
troduced in SNLPllj and [SLNPll| . We present in the next section a more 
general framework to deal with such continuous multi-scale decompositions. 
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2.3 The Order Reversed 

The action (|17l) of the semidirect product from Lemma |2~51 proceeds by deform- 
ing the image with the coarsest scale diffcomorphism first and with the finest 
scale diffcomorphism last. However, it is also possible to reverse this order and 
to act with the finest scale diffeomorphisms first. We will see that this approach 
also corrcpsonds to a semidirect product and is equivalent to the other ordering 
of scales via a change of coordinate. The reason to expand on this here is that 
it is better suited to be generalized to a continuum of scales. 

In this section we will assume that the group G\ contains the deformations of 
the coarsest scale and G n those of the finest scale. The corresponding semidirect 
product is described in the following lemma. 

Lemma 2.10. Let G\ C G 2 C . . . C G n be a chain of Lie groups. The n-fold 
semidirect product is given by the multiplication 

(gi,---,9n) ■ (hi,...,h n ) = {gih 1 ,c h -ig 2 h 2 , . ■ . ,C( hl ... hn _ 1 yi g n h n ) 

with c g h = ghg~ l denoting conjugation. Then given a right-invariant velocity 
field v(t) = (v\(t), . . . ,v n (t)) the curve g(t) is reconstructed via 

d t g k {t) = Ad (gi(t) ... gfc _ l(f)) -i v k (t), 

which is the semidirect product equivalent of dtg(t) = v(t)g(t). We shall denote 
this product by 

G\ K • • • K G n 

to emphasize that each subproduct G± K • • ■ IX Gk from the left is a normal subgroup 
of the whole product. 

Proof. This lemma can be proven in the same way as Lemma 12.51 □ 

These semidirect products defined in Lemmas 12. 51 and 12. 101 are equivalent as 
shown by the following lemma 

Lemma 2.11. The map 



'■P 



Gi x ••• x G„ — > G n K • • • K G\ 

{91, ■ ■ ■ i9n) >-> (ffn, ■ • ■ , C^ + a-fc.-.g™)- 1 9n+l-k, ■ • ■ , C( S2 . „ ffn ) -l ffl) 



is a group homomorphism between the two semidirect products and its derivative 
at the identity is given by 



01 X ■ • • X 0n -> 0n K • • • X 01 

(vi,...,v n ) >-> (v n ,...,V n+ l-k,---,Vl) 



Proof. Direct computation. □ 
The map ip can be seen as one side of the following commuting triangle 

G\ xi • • • x G n s- G n x • • • K G\ 

Gi x ■ ■ • x G, 
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The maps 



Ti(gi, ■ ■ -,g n ) = {gi ■ ■ -g n ,- ■ -,g n ) 
72(51, ■ ■ .,g n ) = {g n ,g n -ig n , ■ ■-,gi---g n ) 

are group homomorphisms from the corresponding scmidirect products into the 
direct product G\ x ... G n . They can be regarded as trivializations of the 
scmidirect product in the special case that the factors form a chain of subgroups. 

We will now assume that we are given n kernels K\ , . . . , K n such that the 
inclusions Hx i C Hx i+1 , i-e. K\ represents the coarsest scale and K n the finest 
one. Note that the inclusions are reversed as compared to Section 12.21 The 
registration problem is now defined as follows. 

Definition 2.12 (LDDMM with the alternative Semidirect Product). Regis- 
tering Iq to itarg is done by finding the minimizing n-tuple (vi(t), . . . , v n (t)) 
of 

lit f \H(t)\\ 2 K,dt + ^y(i).i a -i targ \\ 2 , 

i=l ^° 

where ip(t) = ip\{t) o . . . o ip n (t) and ipk(t) is defined via 

d t ipk(t) = (Ad Wl (t) ... ^_ 1 ( 4) )-i v k (t)) oi/j k (t) . 

To see that problems 12.81 and 12.121 are equivalent note that the following 
diagram commutes 

G± X ■ ■ ■ X G n 5- G n K • • • K G\ 




and that T e <p merely reverses the order of the vector fields {v\(t), . . . ,v n (t)). 
We will see in Section 13.21 that this version of the semidirect product is better 
suited to be generalized to a continuum of scales. 

3 Extension to a continuum of scales 
3.1 The continuous mixture of kernels 

In this section, we define the multi-scale approach for a continuum of scales. 
First, we introduce the necessary analytical framework and useful results. 

Definition 3.1. Let fi be a domain in M. d . An admissible bundle (T-L,X) is a 
one-parameter family of Reproducing Kernel Hilbert Spaces of C 1 (fi) vector 
fields H s indexed by s G such that for any s, there exists a positive constant 
M s s.t. for every v £ H s 

IM|i,«,<M a |M|jj.. (18) 

The map K, : R+ x W l x R d 3 (s,x,y) h-> K s (x,y) G L(R d ) is assumed to be 
Borel measurable. In addition to this set of linear spaces, A is a Borcl measure 
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on such that 

/ M s 2 dA(s) < +00 , (19) 
where s — > M s is also assumed to be Borel measurable. 

Remark 3.2. • The hypothesis of the definition may not be optimal to 
obtain the needed property, but this context is already large enough for 
applications. 

• Remark that no inclusion is a priori required between the linear spaces 
H s . However the typical example is given by the usual scale-space i.e. 

||x-y|| 2 

T-L s defined by its Gaussian kernel e 2^ . In this case, there exists an 
obvious towering H s C H t for s > t > 0. 

• We have f R „ K s (x,y) d\(s) < J R , Mf dA(s). This comes from Cauchy- 

Schwarz's inequality to get (a, K s (x,y)(3) < \\h s ||<^||ff s and the fact 
that H^IIh. < \a\M s . 

Mimicking Section [5J we consider the set of vector- valued functions defined 
on R!j_ x f2, namely denoting /i the Lebesgue measure on fl, 



V := If e L 2 (R* + xn,\®fi) 



(1) Va; S s — > f(s,x) is measurable. 

(2) s — > ||/(s, -)\\h is measurable. 

(3) II/IIW R , \\f(s,.)\\ 2 Hs dA(.s)<+oo. 

(20) 

It is rather standard to prove that V is a Hilbert space for the norm introduced 
in (|2"(H) . Remark that V contains all the functions (s,x) — > K s (x,.) for any 
x £ fl. Directly from the assumptions on the space V , we can define the set of 
functions 



H := l x -> / v s (x) d\(s) 



v e v } . (21) 



Then, the generalization of Lemma 12.11 Sai88|, ISch64| reads in our situation: 
Theorem 3.3. The space H can be endowed with the following norm: For any 

|| ff ||^ = inf / 11/X.dAOO, (22) 



for f satisfying the constraint g = J R . f s dA(s). This norm turns H into a 
RKHS whose kernel is JC(x, y) = L. K s (x, y) dA(s). 



In our case, the hypothesis on the bundle (H, A) implies that H is an admis- 
sible RKHS. 

Proof. As mentioned above, the map 

n:V ^ H 

v s dA(s) 
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is well defined and so is it* : H* i-> V*. Using Cauchy-Schwarz's inequality, we 
have 



v s (x) dA(s) 



< 



\fs\U 



Ml dA(s) 



|7r(w)(a;)| < 7r(v)| 



Ml dA(s). 




(23) 



Hence, the evaluation map S x o ir : v 6 V —> J R » v s (x) dX(s) is continuous 

on V. Therefore the map ir is continuous for the product topology on (R d ) n 
and its kernel is a closed subspace denoted by H. Applying the projection 
theorem on for any u £ H there exists a unique u such that 



L~,l|2 



inf„ 



Y | ir(v) = u}. Hence, Equation (|2"2")l defines a norm on H for which 
7T|^j. is an isometry. In particular H is a Hilbert space. 

Remark that Inequality (|2"3"|) means that H is a RKHS. We can now apply 
the Lagrange multiplier rule in a Hilbert space on the following functional 



t;\\v\\v + (P: u ~ n ( v )) H* ,H 



(24) 



where (u,p) £ H x iJ* and u € V. For a stationary point, we obtain n(ii) = u, 
K s Tr*p(s) ~ w s , A a.e. The Riesz isomorphism is then given by 



K : p e H* 



K s w*(p)(s) d\(s) e V , 



and the kernel function is given by 

(5 x ,KS y ) = [ K s {x,y) dA(s). 



(25) 



(26) 



□ 

Remark 3.4. Importantly, the hypothesis on the bundle (H, A) implies that H 
is an admissible RKHS since we can apply a theorem of differentiation under 
the integral sign. By application of Cauchy-Schwarz's inequality to the integral 
defining 7r(u) for any v G V, we have that 



IK«)lltoo< / MldX(s) I \\v s \\ 2 H , d\(s) 



(27) 



3.2 Scale decomposition 

In this section we will generalize the ideas of Section I2.2I from a finite sum of 
kernels to a continuum of scales. We will make some more assumptions to the 
general setting introduced in Section I3.ll 

Assumption. We assume that the measure A(s) is the Lebesgue measure on 
the finite interval [0, 1] and that the family H s of RKHS is ordered by inclusion 

H s C H t for s < t 
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This assumption might be relaxed a little bit. The main restriction is that 
we don't want the measure A to contain any singular parts. As long as A(s) 
is absolutely continuous with respect to the Lebesgue measure, i.e. it can be 
represented via a density A(s) = f(s) dx, the same construction can be carried 
out. The ordering of the inclusions corresponds to that in Section [2731 

As in the discrete setting we can formulate the two image matching problems. 
The first is a direct generalization of problem 12.61 to a continuum of scales. 



targ I 



Definition 3.5 (LDDMM with an Integral Kernel). Registering Iq to /targ is 
done by finding the minimizer v(t) of 

\J\Ht)\\idt +1 ^Mi).i -i targ r , 

where K = K s ds is the integral over the scales A";,. 

The o ther problem assoc iates to each scale a separate vector field. It was pro- 
posed in [SNLPll , SLNP11 1 , where it was called registration with a kernel bun- 
dle. The term kernel bundle refers to the one-parameter family H = (i/ s ) sg [o i i] 
of RKHS. 

Definition 3.6 (LDDMM with a Kernel Bundle). Registering Iq to Itarg is 
done by finding the one-parameter family v s (t) of vector fields, which minimizes 

\ J J \\v s (t)\\ 2 Ka dsdt + 2^||v(l).Io - I 

where ip(t) is the flow of the vector field v(t) = f Q v s (t)ds. 

These two problems are equivalent, as will be shown in Theorem 13.111 As 
a next step we want to obtain a geometric reformulation of the registration 
problem similar to 12.81 or 12.121 The goal of this reformulation is to decompose 
the minimizing flow of diffcomorphisms ip(t), such that the effect of each scale 
becomes visible. In order to do this decomposition we define 

ip s {t) to be the flow of / v r (t) dr . (28) 
Jo 

The following theorem allows us to interchange time and scale in the flow %p s (t) . 

Theorem 3.7. For each fixed t, the one-parameter family s i-> ips(t) is the flow 
of 

A d^ s (t) / Ad,0 a(r )-i v s (r)dr . 
Jo 

To prove this theorem we will use the following lemma. 

Lemma 3.8. Let u(s,t,x) and v(s,t,x) be two-parameter families of vector 
fields which are C 2 in the (s,t) -variables and C 1 in x. If they satisfy 

d s u(s, t, x) — dtv(s, t, x) = [v(s, t),u(s, t)](x) 

and if v(s, 0) = for all s, then the flow of u(s, .) for fixed s coincides with the 
flow of v(., t) for fixed t. 
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Proof. Denote by a s (t) the flow of u(s, .) in t. Then 

d t d s a s (t) = d s d t a s (t) = d s (u(s,t) oo,(t)) 

= d s ii(s, t) o a s (t) + Du(s, t, a s (t)).d s a s (t) 

= d t v(s, t) o a s (t) + [v(s, t),u(s, t)] o a s (t) + Du(s, t, a s (t)).d s a s (t) 
= d t (v(s,t) o a s (t)) + Du(s,t,a s (t)). (d s a s (t) - v(s,t) o a s (t)) 

This implies that b s (t) := d s a s (t) — v(s,t) o a s (t) is the solution of the ODE 

dtb,(t) = Du(s,t,a s {t)).b s (t) . (29) 

Since for t = we have b s (0) = d s a s (0) — v(s,0) o a s (t) = 0, it follows that 
b s (t) = is the unique solution of ([2^)1 . This means that 



d s a s (t) = v(s,t) o a s (t) , 
i.e. the flows of u(s, .) in t and of v(.,t) in s coincide. 



□ 



Proof of Theorem. Wc apply Lemma 13.81 to the vector fields 



u r (t) dr and Ad^, a ( t ) / Ad^, s ( r )-i v s (r) dr 



Wc can differentiate Ad using the following rule 



d t Ad g(t ) u = [d t g(t)g(t) , Ad g(t) uj 
Using this we can verify the compatibility condition 

d 8 ( f v r (t)dr^j - d t ^Ad 4 , s{t ) J AcL^^-i v s (r) dr 



v s (t) - d t ip s (t)4> s (t) 1 ,Ad^ s(t) / Ad^, s ( r )-i v s (r)dr - v s (t) 

Jo 

v s (t)dr,Ad^ t ) / Ad^ s(r )-i v s (r)d . 
Jo 

The condition v(s, 0) = is trivially satisfied. This concludes the proof. □ 

The main conclusion from Theorem 13.71 can be summarized in the corollary. 

Corollary 3.9. Let tp(t) be the flow of the vector field J Q v s (t) ds and define 
r](s) via the left-invariant velocity, i.e. 

r){s)~ 1 d s r]{s) = / Ad^-i v s (t) dt . 



Then we have 

V(l) = '?(l) , 
i.e. r/(s) is the flow in scale of <p(t). 

Proof. Apply Theorem 13.71 with s = 1. 



□ 



Corollarv l3.9l gives us a way to decompose the matching diffeomorphism ip(l) 
into separate scales. As we follow the flow rj(s), we add more and more scales, 
starting from the identity, when no scales are taken into account and finishing 
with ip(l), which includes all scales. In this sense rj(t) orj(s)~ 1 contains the scale 
information for the scales in the interval [s,i\. 
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3.3 Restriction to a Finite Number of Scales 



It is of interest to understand the relation between a continuum of scales and 
the case, where we have only a finite number. We will see, that it is possible to 
sec the discrete case as an approximation of the conitnuous case. 

Let us start with a family K s of kernels with s £ [0, 1], where the scales are 
ordered from the coarsest to the finest, i.e. Hk b < HK t for s < t as before. 
Divide the interval [0, 1] into n parts = <o < ■ • • < tn = 1 an d denote the 
intervals Ik = \tk-i,tk\. Define the space 



L 2 {U) 



where % was defined in Definition 13.11 to be a one-parameter family of vector 
fields v s such that v s G H s . To each interval I k corresponds a kernel K k = 
J j K s ds. The discrete sampling map 



L 2 (U) -> H Kl x • • • x H Kn 
v s i y (J h v s ds,..., J In v s ds) 



discretizes v s into n scales. Formally we can introduce a Lie bracket on the 
space L 2 (H) by defining 



u s , / v r dr 



u r dr, v a 



(30) 



Using this bracket the sampling map $ is a Lie algebra homomorphism as shown 
in the next theorem. 

Theorem 3.10. The sampling map is a Lie algebra homomorphism from 
the Lie algebra L 2 (T-L) with the bracket defined in (|30[) into the n-fold semidirect 
product with the bracket 



fc-i 



fc-i 



[(ui, . . . ,u n ), (vi, . . . ,«„)] = . . . , [u k ,y^^Vj] + [y~] Uj, v k ] + [u k ,v k }, ■ ■ ■) 



Proof. Using the definitions we first compute 





pt k rtk-i 




rtk—i nth 


[*(u),tf(w)] fc = 


/ u s ds, / v s ds 


+ 


/ u s ds, / v s ds 




Jtk-i Jo 




JO Jt k -! 



u s ds, 



v, ds 



u s ds, I v s ds 
and then write the other side 
*([«,«])fc 



v„ ds 











r 


u s , j v r dr 


+ 


/ u r dr, v s 


itk-i 


Jo 




Jo 



ds 
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Below we interchange the order of integration in the first summand and merely 
switch s and r in the second summand to obtain 















u s , / v r dr 


+ 


/ u r dr, v s 


ds 


Jo 


Jo 




Jo 





JO 

tk ptk 



u s , v r ] dr ds 4 
u s , v r ] ds dr - 



/ / [u s ,v r ]dsdr 
Jo Jo 

[u s ,v r ] ds dr 



o Jo 



[u s , v r ] ds dr 



o Jo 

tk 



u s ds, 

Decomposing the integral into 

V([u,v])k = 

finishes the proof. 



, ds 



ds 



ds 



□ 



Now we can show that all matching problems that we defined in the contin- 
uous case are equivalent. 



Theorem 3.11. The matching vroblems Iff. 51 and \3.6\ are equivalent and using 
the sampling map ^ they are also equivalent to the discrete vroblem \2. 12i 

Proof. The first equivalence follows from Theorem 13.31 For the second equiva- 
lence note that problem 12.121 is equivalent to 12.61 and use Lemma 12.11 □ 



The diffeomorphisms ipk at each scale, that were defined in 12.121 are also 
contained in the continuous setting. Note that ipi(t) o . . . o ipk(t) was the flow 
of the vector field vx(t) + ■ ■ ■ + Vkit) and using the map ^ we have 



ui(t) + • • • + v k {t) 



ik 



v s {t) ds 



Hence we obtain the identity tpi(t) o . . . o ip k (t) = tp tk (t) , where ipt k was defined 
in (p2"5)) . In particular we retrieve 

Mt) = ^t k _ 1 (tr 1 °i>t k (t) 

the scale decomposition of the discrete case as a continuous scale descomposition 
evaluated at specific points. 



4 Conclusion and outlook 

In this paper, we have extended the mixture of kernels presented in |RVW+ll| 
to the continuous setting and we have given a variational interpretation of the 
ma tching problem. In particular, we have shown that the appro aches presented 
in [SNLPll . SLNP11 | and the mixture of kernels of |RVW + 11 1 are equivalent. 
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Motivated by the mathematical development of the multi-scale approach to 
group of dif feomorphisms, we have extended the semi-direct product result of 
BGBHRllj to a finite number of scales and also to a continuum of scales. 



Developing statistics in a multi-scale framework for the initial mo mentum i s 
encouraged by previous results on the statistical use of each scales in 
To this aim, it would be very interesting question to obtain results about the 
sparsity of the scales used in the mixture of kernels in or der to im prove the 
statistical power on the initial moment um. In ana logy with |KBS + 09| but from 
the image registration point of view, |RVW+lll ] tends to favor a non-sparse 



description of the scales, which tends towards the continuum of scales. In this 
direction, a more theoretical approach to learning the parameters involved in 
the mixture of kernels still needs to be developed. 
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